Overview

Dataset statistics

Number of variables10
Number of observations20640
Missing cells207
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.5 MiB
Average record size in memory76.0 B

Variable types

Numeric9
Categorical1

Warnings

longitude is highly correlated with latitudeHigh correlation
latitude is highly correlated with longitudeHigh correlation
total_rooms is highly correlated with total_bedrooms and 1 other fieldsHigh correlation
total_bedrooms is highly correlated with total_rooms and 1 other fieldsHigh correlation
population is highly correlated with householdsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
total_bedrooms has 207 (1.0%) missing values Missing

Reproduction

Analysis started2021-02-05 11:49:00.108334
Analysis finished2021-02-05 11:49:23.500953
Duration23.39 seconds
Software versionpandas-profiling v2.10.0
Download configurationconfig.yaml

Variables

longitude
Real number (ℝ)

HIGH CORRELATION

Distinct844
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-119.5697045
Minimum-124.35
Maximum-114.31
Zeros0
Zeros (%)0.0%
Memory size161.3 KiB
2021-02-05T17:19:23.702324image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum-124.35
5-th percentile-122.47
Q1-121.8
median-118.49
Q3-118.01
95-th percentile-117.08
Maximum-114.31
Range10.04
Interquartile range (IQR)3.79

Descriptive statistics

Standard deviation2.003531724
Coefficient of variation (CV)-0.01675618195
Kurtosis-1.330152366
Mean-119.5697045
Median Absolute Deviation (MAD)1.28
Skewness-0.297801208
Sum-2467918.7
Variance4.014139367
MonotocityNot monotonic
2021-02-05T17:19:23.954304image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-118.31162
 
0.8%
-118.3160
 
0.8%
-118.29148
 
0.7%
-118.27144
 
0.7%
-118.32142
 
0.7%
-118.28141
 
0.7%
-118.35140
 
0.7%
-118.36138
 
0.7%
-118.19135
 
0.7%
-118.25128
 
0.6%
Other values (834)19202
93.0%
ValueCountFrequency (%)
-124.351
< 0.1%
-124.32
< 0.1%
-124.271
< 0.1%
-124.261
< 0.1%
-124.251
< 0.1%
ValueCountFrequency (%)
-114.311
< 0.1%
-114.471
< 0.1%
-114.491
< 0.1%
-114.551
< 0.1%
-114.561
< 0.1%

latitude
Real number (ℝ≥0)

HIGH CORRELATION

Distinct862
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.63186143
Minimum32.54
Maximum41.95
Zeros0
Zeros (%)0.0%
Memory size161.3 KiB
2021-02-05T17:19:24.179147image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum32.54
5-th percentile32.82
Q133.93
median34.26
Q337.71
95-th percentile38.96
Maximum41.95
Range9.41
Interquartile range (IQR)3.78

Descriptive statistics

Standard deviation2.135952397
Coefficient of variation (CV)0.05994501302
Kurtosis-1.117759781
Mean35.63186143
Median Absolute Deviation (MAD)1.23
Skewness0.4659530037
Sum735441.62
Variance4.562292644
MonotocityNot monotonic
2021-02-05T17:19:24.405413image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
34.06244
 
1.2%
34.05236
 
1.1%
34.08234
 
1.1%
34.07231
 
1.1%
34.04221
 
1.1%
34.09212
 
1.0%
34.02208
 
1.0%
34.1203
 
1.0%
34.03193
 
0.9%
33.93181
 
0.9%
Other values (852)18477
89.5%
ValueCountFrequency (%)
32.541
 
< 0.1%
32.553
 
< 0.1%
32.5610
 
< 0.1%
32.5718
0.1%
32.5826
0.1%
ValueCountFrequency (%)
41.952
< 0.1%
41.921
 
< 0.1%
41.881
 
< 0.1%
41.863
< 0.1%
41.841
 
< 0.1%

housing_median_age
Real number (ℝ≥0)

Distinct52
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.63948643
Minimum1
Maximum52
Zeros0
Zeros (%)0.0%
Memory size161.3 KiB
2021-02-05T17:19:24.626667image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile8
Q118
median29
Q337
95-th percentile52
Maximum52
Range51
Interquartile range (IQR)19

Descriptive statistics

Standard deviation12.58555761
Coefficient of variation (CV)0.4394477408
Kurtosis-0.8006288536
Mean28.63948643
Median Absolute Deviation (MAD)10
Skewness0.0603306376
Sum591119
Variance158.3962604
MonotocityNot monotonic
2021-02-05T17:19:24.886779image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
521273
 
6.2%
36862
 
4.2%
35824
 
4.0%
16771
 
3.7%
17698
 
3.4%
34689
 
3.3%
26619
 
3.0%
33615
 
3.0%
18570
 
2.8%
25566
 
2.7%
Other values (42)13153
63.7%
ValueCountFrequency (%)
14
 
< 0.1%
258
 
0.3%
362
 
0.3%
4191
0.9%
5244
1.2%
ValueCountFrequency (%)
521273
6.2%
5148
 
0.2%
50136
 
0.7%
49134
 
0.6%
48177
 
0.9%

total_rooms
Real number (ℝ≥0)

HIGH CORRELATION

Distinct5926
Distinct (%)28.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2635.763081
Minimum2
Maximum39320
Zeros0
Zeros (%)0.0%
Memory size161.3 KiB
2021-02-05T17:19:25.122393image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile620.95
Q11447.75
median2127
Q33148
95-th percentile6213.2
Maximum39320
Range39318
Interquartile range (IQR)1700.25

Descriptive statistics

Standard deviation2181.615252
Coefficient of variation (CV)0.8276977802
Kurtosis32.630927
Mean2635.763081
Median Absolute Deviation (MAD)797
Skewness4.147343451
Sum54402150
Variance4759445.106
MonotocityNot monotonic
2021-02-05T17:19:25.353781image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
152718
 
0.1%
158217
 
0.1%
161317
 
0.1%
212716
 
0.1%
205315
 
0.1%
160715
 
0.1%
147115
 
0.1%
171715
 
0.1%
170315
 
0.1%
172215
 
0.1%
Other values (5916)20482
99.2%
ValueCountFrequency (%)
21
< 0.1%
61
< 0.1%
81
< 0.1%
111
< 0.1%
121
< 0.1%
ValueCountFrequency (%)
393201
< 0.1%
379371
< 0.1%
326271
< 0.1%
320541
< 0.1%
304501
< 0.1%

total_bedrooms
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct1923
Distinct (%)9.4%
Missing207
Missing (%)1.0%
Infinite0
Infinite (%)0.0%
Mean537.8705525
Minimum1
Maximum6445
Zeros0
Zeros (%)0.0%
Memory size161.3 KiB
2021-02-05T17:19:25.586748image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile137
Q1296
median435
Q3647
95-th percentile1275.4
Maximum6445
Range6444
Interquartile range (IQR)351

Descriptive statistics

Standard deviation421.3850701
Coefficient of variation (CV)0.7834321252
Kurtosis21.98557506
Mean537.8705525
Median Absolute Deviation (MAD)162
Skewness3.459546332
Sum10990309
Variance177565.3773
MonotocityNot monotonic
2021-02-05T17:19:25.934755image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
28055
 
0.3%
33151
 
0.2%
34550
 
0.2%
39349
 
0.2%
34349
 
0.2%
39448
 
0.2%
32848
 
0.2%
34848
 
0.2%
27247
 
0.2%
30947
 
0.2%
Other values (1913)19941
96.6%
(Missing)207
 
1.0%
ValueCountFrequency (%)
11
 
< 0.1%
22
 
< 0.1%
35
< 0.1%
47
< 0.1%
56
< 0.1%
ValueCountFrequency (%)
64451
< 0.1%
62101
< 0.1%
54711
< 0.1%
54191
< 0.1%
52901
< 0.1%

population
Real number (ℝ≥0)

HIGH CORRELATION

Distinct3888
Distinct (%)18.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1425.476744
Minimum3
Maximum35682
Zeros0
Zeros (%)0.0%
Memory size161.3 KiB
2021-02-05T17:19:26.164011image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile348
Q1787
median1166
Q31725
95-th percentile3288
Maximum35682
Range35679
Interquartile range (IQR)938

Descriptive statistics

Standard deviation1132.462122
Coefficient of variation (CV)0.7944444737
Kurtosis73.55311639
Mean1425.476744
Median Absolute Deviation (MAD)440
Skewness4.935858227
Sum29421840
Variance1282470.457
MonotocityNot monotonic
2021-02-05T17:19:26.400700image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
89125
 
0.1%
122724
 
0.1%
105224
 
0.1%
76124
 
0.1%
85024
 
0.1%
82523
 
0.1%
99922
 
0.1%
100522
 
0.1%
78222
 
0.1%
78121
 
0.1%
Other values (3878)20409
98.9%
ValueCountFrequency (%)
31
 
< 0.1%
51
 
< 0.1%
61
 
< 0.1%
84
< 0.1%
92
< 0.1%
ValueCountFrequency (%)
356821
< 0.1%
285661
< 0.1%
163051
< 0.1%
161221
< 0.1%
155071
< 0.1%

households
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1815
Distinct (%)8.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean499.5396802
Minimum1
Maximum6082
Zeros0
Zeros (%)0.0%
Memory size161.3 KiB
2021-02-05T17:19:26.621279image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile125
Q1280
median409
Q3605
95-th percentile1162
Maximum6082
Range6081
Interquartile range (IQR)325

Descriptive statistics

Standard deviation382.3297528
Coefficient of variation (CV)0.7653641301
Kurtosis22.05798806
Mean499.5396802
Median Absolute Deviation (MAD)151
Skewness3.410437712
Sum10310499
Variance146176.0399
MonotocityNot monotonic
2021-02-05T17:19:26.849191image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30657
 
0.3%
33556
 
0.3%
38656
 
0.3%
28255
 
0.3%
42954
 
0.3%
37553
 
0.3%
29751
 
0.2%
28451
 
0.2%
34050
 
0.2%
36250
 
0.2%
Other values (1805)20107
97.4%
ValueCountFrequency (%)
11
 
< 0.1%
23
< 0.1%
34
< 0.1%
44
< 0.1%
57
< 0.1%
ValueCountFrequency (%)
60821
< 0.1%
53581
< 0.1%
51891
< 0.1%
50501
< 0.1%
49301
< 0.1%

median_income
Real number (ℝ≥0)

Distinct12928
Distinct (%)62.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.870671003
Minimum0.4999
Maximum15.0001
Zeros0
Zeros (%)0.0%
Memory size161.3 KiB
2021-02-05T17:19:27.081423image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0.4999
5-th percentile1.60057
Q12.5634
median3.5348
Q34.74325
95-th percentile7.300305
Maximum15.0001
Range14.5002
Interquartile range (IQR)2.17985

Descriptive statistics

Standard deviation1.899821718
Coefficient of variation (CV)0.4908249026
Kurtosis4.952524102
Mean3.870671003
Median Absolute Deviation (MAD)1.0642
Skewness1.646656702
Sum79890.6495
Variance3.60932256
MonotocityNot monotonic
2021-02-05T17:19:27.313913image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3.12549
 
0.2%
15.000149
 
0.2%
2.87546
 
0.2%
4.12544
 
0.2%
2.62544
 
0.2%
3.87541
 
0.2%
338
 
0.2%
3.37538
 
0.2%
3.62537
 
0.2%
437
 
0.2%
Other values (12918)20217
98.0%
ValueCountFrequency (%)
0.499912
0.1%
0.53610
< 0.1%
0.54951
 
< 0.1%
0.64331
 
< 0.1%
0.67751
 
< 0.1%
ValueCountFrequency (%)
15.000149
0.2%
152
 
< 0.1%
14.90091
 
< 0.1%
14.58331
 
< 0.1%
14.42191
 
< 0.1%

median_house_value
Real number (ℝ≥0)

Distinct3842
Distinct (%)18.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean206855.8169
Minimum14999
Maximum500001
Zeros0
Zeros (%)0.0%
Memory size161.3 KiB
2021-02-05T17:19:27.532724image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum14999
5-th percentile66200
Q1119600
median179700
Q3264725
95-th percentile489810
Maximum500001
Range485002
Interquartile range (IQR)145125

Descriptive statistics

Standard deviation115395.6159
Coefficient of variation (CV)0.55785531
Kurtosis0.3278702429
Mean206855.8169
Median Absolute Deviation (MAD)68400
Skewness0.9777632739
Sum4269504061
Variance1.331614816 × 1010
MonotocityNot monotonic
2021-02-05T17:19:27.764158image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
500001965
 
4.7%
137500122
 
0.6%
162500117
 
0.6%
112500103
 
0.5%
18750093
 
0.5%
22500092
 
0.4%
35000079
 
0.4%
8750078
 
0.4%
27500065
 
0.3%
15000064
 
0.3%
Other values (3832)18862
91.4%
ValueCountFrequency (%)
149994
< 0.1%
175001
 
< 0.1%
225004
< 0.1%
250001
 
< 0.1%
266001
 
< 0.1%
ValueCountFrequency (%)
500001965
4.7%
50000027
 
0.1%
4991001
 
< 0.1%
4990001
 
< 0.1%
4988001
 
< 0.1%

ocean_proximity
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size80.7 KiB
<1H OCEAN
9136 
INLAND
6551 
NEAR OCEAN
2658 
NEAR BAY
2290 
ISLAND
 
5

Length

Max length10
Median length9
Mean length8.064922481
Min length6

Characters and Unicode

Total characters166460
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNEAR BAY
2nd rowNEAR BAY
3rd rowNEAR BAY
4th rowNEAR BAY
5th rowNEAR BAY
ValueCountFrequency (%)
<1H OCEAN9136
44.3%
INLAND6551
31.7%
NEAR OCEAN2658
 
12.9%
NEAR BAY2290
 
11.1%
ISLAND5
 
< 0.1%
2021-02-05T17:19:28.282797image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-02-05T17:19:28.504209image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
ocean11794
34.0%
1h9136
26.3%
inland6551
18.9%
near4948
14.2%
bay2290
 
6.6%
island5
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
N29849
17.9%
A25588
15.4%
E16742
10.1%
14084
8.5%
O11794
 
7.1%
C11794
 
7.1%
<9136
 
5.5%
19136
 
5.5%
H9136
 
5.5%
I6556
 
3.9%
Other values (6)22645
13.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter134104
80.6%
Space Separator14084
 
8.5%
Math Symbol9136
 
5.5%
Decimal Number9136
 
5.5%

Most frequent character per category

ValueCountFrequency (%)
N29849
22.3%
A25588
19.1%
E16742
12.5%
O11794
 
8.8%
C11794
 
8.8%
H9136
 
6.8%
I6556
 
4.9%
L6556
 
4.9%
D6556
 
4.9%
R4948
 
3.7%
Other values (3)4585
 
3.4%
ValueCountFrequency (%)
14084
100.0%
ValueCountFrequency (%)
<9136
100.0%
ValueCountFrequency (%)
19136
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin134104
80.6%
Common32356
 
19.4%

Most frequent character per script

ValueCountFrequency (%)
N29849
22.3%
A25588
19.1%
E16742
12.5%
O11794
 
8.8%
C11794
 
8.8%
H9136
 
6.8%
I6556
 
4.9%
L6556
 
4.9%
D6556
 
4.9%
R4948
 
3.7%
Other values (3)4585
 
3.4%
ValueCountFrequency (%)
14084
43.5%
<9136
28.2%
19136
28.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII166460
100.0%

Most frequent character per block

ValueCountFrequency (%)
N29849
17.9%
A25588
15.4%
E16742
10.1%
14084
8.5%
O11794
 
7.1%
C11794
 
7.1%
<9136
 
5.5%
19136
 
5.5%
H9136
 
5.5%
I6556
 
3.9%
Other values (6)22645
13.6%

Interactions

2021-02-05T17:19:06.583508image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:06.816312image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:07.008587image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:07.218147image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:07.427815image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:07.635089image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:07.828430image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:08.045741image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:08.242143image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:08.436705image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:08.622180image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:08.833220image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:09.022475image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:09.235810image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:09.427956image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:09.708843image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:09.926572image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:10.127507image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:10.310722image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:10.530262image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:10.714598image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:10.937952image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:11.145337image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:11.346524image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:11.537998image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:11.750015image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:11.952976image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:12.178740image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:12.389234image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:12.613619image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:12.822210image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:13.047918image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:13.273845image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:13.492157image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:13.711881image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:13.920346image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:14.152307image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:14.369178image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:14.572686image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:14.772294image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:14.973740image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:15.179743image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:15.471844image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:15.688860image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:15.910564image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:16.132973image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:16.364593image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:16.604542image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:16.831046image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:17.030345image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:17.248127image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:17.440001image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:17.680861image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:17.911682image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:18.117652image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:18.359519image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:18.578089image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:18.782871image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:19.012936image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:19.235514image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:19.488978image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:19.694033image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:19.911314image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:20.132544image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:20.444199image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:21.037508image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:21.266746image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:21.500520image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:21.741603image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:22.049788image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:22.263825image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-05T17:19:22.486549image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Correlations

2021-02-05T17:19:28.951135image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-02-05T17:19:29.273217image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-02-05T17:19:29.766374image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-02-05T17:19:30.099398image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-02-05T17:19:22.838597image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
A simple visualization of nullity by column.
2021-02-05T17:19:23.140342image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-02-05T17:19:23.375067image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximity
0-122.2337.8841880129.03221268.3252452600NEAR BAY
1-122.2237.862170991106.0240111388.3014358500NEAR BAY
2-122.2437.85521467190.04961777.2574352100NEAR BAY
3-122.2537.85521274235.05582195.6431341300NEAR BAY
4-122.2537.85521627280.05652593.8462342200NEAR BAY
5-122.2537.8552919213.04131934.0368269700NEAR BAY
6-122.2537.84522535489.010945143.6591299200NEAR BAY
7-122.2537.84523104687.011576473.1200241400NEAR BAY
8-122.2637.84422555665.012065952.0804226700NEAR BAY
9-122.2537.84523549707.015517143.6912261100NEAR BAY

Last rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximity
20630-121.3239.29112640505.012574453.5673112000INLAND
20631-121.4039.33152655493.012004323.5179107200INLAND
20632-121.4539.26152319416.010473853.1250115600INLAND
20633-121.5339.19272080412.010823822.549598300INLAND
20634-121.5639.27282332395.010413443.7125116800INLAND
20635-121.0939.48251665374.08453301.560378100INLAND
20636-121.2139.4918697150.03561142.556877100INLAND
20637-121.2239.43172254485.010074331.700092300INLAND
20638-121.3239.43181860409.07413491.867284700INLAND
20639-121.2439.37162785616.013875302.388689400INLAND